Josh Quan
UC Berkeley Library
Fall 2017
An approximate answer to the right question is worth a great deal more than a precise answer to the wrong question.
-John Tukey
How feasible or doable is your research question?
Can you answer the question with a simple descriptive statistic (like an average, median, percentage, etc)? If so, then your research question might be too narrow.
How many observations do you need?
Does the answer to your question have too many angles? If so, then your question might be too broad to answer on time.
| Unit of Analysis | Geography | Time-Period | Frequency |
|---|---|---|---|
| For which level do you want data? Summary or Micro? (individuals, counties, nations) | Is there a geographic component to your topic? (U.S., Sub-Saharan Africa, India) | Do you want a data for a specific time period? (1980-2000, 1930-1960) | How often do you want measures for your variables? (every year, every ten years, monthly, quarterly) |
| Researchers | Government Agencies | NGOs | Research Organizations |
|---|---|---|---|
| Are there people you know who are doing this kind of research? | Think about government agencies - is the request for some official statistics or data that they’d be likely to collect and publish? (industry, agriculture, construction, disease, crime) | Are there councils or interest organizations devoted to the topic that might collect data independently? (HIV/AIDS, drugs, civil rights) | Would any specific research organizations be interested in the topic? (Pew, Roper, Gallup, NORC, NBER, World Bank, OECD) |
It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data. -Dasu and Johnson, 2003
http://statbel.fgov.be/en/statistics/figures/economy/indicators/prix_prod_con/
url='http://statbel.fgov.be/en/statistics/figures/economy/indicators/prix_prod_con/'
TAB=read_html(url)%>%html_nodes('td')%>%html_text()
NAMES=read_html(url)%>%html_nodes('th')%>%html_text()
M=data.frame(matrix(TAB,ncol=5,nrow=9,byrow=T))
M=cbind(NAMES[7:15],M)
names(M)=NAMES[1:6]
M## Gross indices (2010=100) I II III IV Year
## 1 2008 99.9 101.2 101.0 102.3 101.1
## 2 2009 101.0 99.7 100.5 98.9 100.0
## 3 2010 99.4 99.8 100.0 100.8 100.0
## 4 2011 102.9 103.2 104.5 105.1 103.9
## 5 2012 105.7 106.1 106.0 105.6 105.9
## 6 2013 105.4 105.4 106.7 107.1 106.1
## 7 2014 107.3 107.2 107.4 107.6 107.4
## 8 2015 108.6 108.8 109.3 109.5 109.1
## 9 2016 110.3 110.7 110,8 Â 111,3 110.8